Communication Efficient Coresets for Empirical Loss Minimization

نویسندگان

Sashank J. Reddi

Barnabás Póczos

Alexander J. Smola

چکیده

In this paper, we study the problem of empirical loss minimization with l2-regularization in distributed settings with significant communication cost. Stochastic gradient descent (SGD) and its variants are popular techniques for solving these problems in large-scale applications. However, the communication cost of these techniques is usually high, thus leading to considerable performance degradation. We introduce a novel approach to reduce the communication cost while retaining good convergence properties. The key to our approach is the construction of a small summary of the data, called coreset, at each iteration and solve an easy optimization problem based on the coreset. We present a general framework for analyzing coreset-based optimization and provide interesting insights into existing algorithms from this perspective. We then propose a new coreset construction and provide its convergence analysis for a wide class of problems that include logistic regression and support vector machines. Preliminary experiments show encouraging results for our algorithm on real-world datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distri...

متن کامل

Efficient Empirical Risk Minimization with Smooth Loss Functions in Non-interactive Local Differential Privacy

In this paper, we study the Empirical Risk Minimization problem in the non-interactive local model of differential privacy. We first show that if the ERM loss function is (∞, T )-smooth, then we can avoid a dependence of the sample complexity, to achieve error α, on the exponential of the dimensionality p with base 1/α (i.e., α−p), which answers a question in (Smith et al., 2017). Our approach ...

متن کامل

DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

We propose a new distributed algorithm for empirical risk minimization in machine learning. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the resul...

متن کامل

Distributed Inexact Damped Newton Method: Data Partitioning and Load-Balancing

In this paper we study inexact dumped Newton method implemented in a distributed environment. We start with an original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015]. We will show that this algorithm may not scale well and propose an algorithmic modifications which will lead to less communications, better lo...

متن کامل

Training Support Vector Machines using Coresets

Note: This work was done as a course project as part of an ongoing research effort that was recently submitted [2]. The submission, done in collaboration with Murad Tukan, Dan Feldman, and Daniela Rus [2], supersedes the work in this manuscript. We present a novel coreset construction algorithm for solving classification tasks using Support Vector Machines (SVMs) in a computationally efficient ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Communication Efficient Coresets for Empirical Loss Minimization

نویسندگان

چکیده

منابع مشابه

Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

Efficient Empirical Risk Minimization with Smooth Loss Functions in Non-interactive Local Differential Privacy

DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

Distributed Inexact Damped Newton Method: Data Partitioning and Load-Balancing

Training Support Vector Machines using Coresets

عنوان ژورنال:

اشتراک گذاری